Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: DannyDaemonic Date: Apr 27, 2008 12:59pm
Forum: web Subject: Re: Robots.txt Policy is a Failure!

Hey that's exactly what I was thinking. This isn't fair at all. The problem with your new approach is that sometimes people will want their previous pages taken down, due to some embarrassment or vandalism.

Probably the best solution to this would be, as you suggested, never retroactively remove pages due to a robots.txt, but also, to copy google's page verification (where you host a webpage with a certain name) and once you're verified allow you to delete any or all dates archived. This prevents owners from destroying their history with a bad robots.txt, and allows people to remove just the vandalized date and not everything all at once.

Reply to this post
Reply [edit]

Poster: mucizeurunler Date: Jul 20, 2008 1:19pm
Forum: web Subject: Re: Robots.txt Policy is a Failure!

Robots.txt file is very important for GoogleBots..

[example website]

This post was modified by Detective John Carter of Mars on 2008-07-20 20:19:34

Reply to this post
Reply [edit]

Poster: DannyDaemonic Date: Jul 3, 2008 2:37pm
Forum: web Subject: Re: Robots.txt Policy is a Failure!

Apparently you don't know what internet archive's robots.txt policy is. This has nothing to do with google or search engines. No one here is trying to stop robots.txt from doing what it's suppose to. We are simply unhappy with the way Internet Archive uses robots.txt. If someone tells robots not to index their search page, Internet Archive will delete that page from it's archive -- not just that version of the page, ALL previous versions of the page. And it's gone forever, so if you're a webmaster and you want to block all these search bots because you're under heavy load you'll accidentally and permanently erase all Internet Archive copies of your website. Also, sometimes sites change hands, new owners won't want their pages backed up by Internet Archive, but in order to do that they have to also erase all previous archives.

This policy is what we're calling a failure. Robots.txt can delete all previous archives of a page, there's no fine grain control, and it's too easy for a website to accidentally erase it's entire history.

Reply to this post
Reply [edit]

Poster: Robin_1990 Date: May 7, 2008 8:25am
Forum: web Subject: Robots.txt Policy makes little baby Gohan cry!

It's true.

Reply to this post
Reply [edit]

Poster: MSRX92 Date: May 7, 2008 1:20pm
Forum: web Subject: Re: Robots.txt Policy makes little baby Gohan cry!

Surely there is a different way to do this. Like, if I take a picture of someone's house, and the house is sold, the new owner can't demand that I no longer show anyone a picture of that house.

Reply to this post
Reply [edit]

Poster: DannyDaemonic Date: May 8, 2008 12:38am
Forum: web Subject: Re: Robots.txt Policy makes little baby Gohan cry!

This is the best idea I can come up with, and it's certainly better than what we currently have. And some might argue it's completely fair. I'm not one of those people but I could see this making a certain group of people 100% happy. Perhaps even the lawyers would ok this one.

The major problem my solution solves is people putting up robots.txt's that deny all robots in an attempt to save on bandwidth while they decide what to do with their website, or squat on it, or resell it, or whatever. The accidental side effect is they delete their archive history for the whole site.

Knowing this has been stopped should comfort you, a least a little since there are no more accidental erasures (and most people don't even know about the internet archive). Of course you are always free to archive your own website on CD or DVD or whatever medium you choose. That will have to be your picture for now.